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BACKGROUND OF THE INVENTION 

This invention is directed to methods for 
simultaneous identification of differentially expressed 
mRNAs, as well as measurements of their relative 
concentrations . 

An ultimate goal of biochemical research ought 
to be a complete characterization of the protein 
molecules that make up an organism. This would include 
their identification, sequence determination, 
demonstration of their anatomical sites of expression, 
elucidation of their biochemical activities, and 
understanding of how these activities determine 
organismic physiology. For medical applications, the 
description should also include information about how the 
concentration of each protein changes in response to 
pharmaceutical or toxic agents. 

Let us consider the scope of the problem: How 
many genes are there? The issue of how many genes are 
expressed in a mammal is still unsettled after at least 



two decades of study* There are few direct studies that 
address patterns of gene expression in different tissues * 
Mutational load studies (J.O* Bishop, "The Gene Numbers 
Game," Cell 2:31-86 (1974); T. Ohta & M. Kimura, 
"Functional Organization of Genetic Material as a Product 
of Molecular Evolution, n Nature 223:118-119 (1971)} have 
suggested that there are between 3xl0 4 and 10 5 essential 
genes . 

Before cDNA cloning techniques, information on 
gene expression came from RNA complexity studies: analog 
measurements (measurements in bulk) based on observations 
of mixed populations of RNA molecules with different 
specificities in abundances* To an unexpected extent, 
early analog complexity studies were distorted by hidden 
complications of the fact that the molecules in each 
tissue that make up most of its mRNA mass comprise only a 
small fraction of its total complexity* Later, cDNA 
cloning allowed digital measurements (i.e., sequence- 
specific measurements on individual species) to be made; 
hence, more recent concepts about mRNA expression are 
based upon actual observations of individual RNA species* 

Brain, liver, and kidney are the mammalian 
tissues that have been most extensively studied by analog 
RNA complexity measurements. The lowest estimates of 
complexity are those of Hastie and Bishop (N.D. Hastie & 
J. B* Bishop, "The Expression of Three Abundance Classes 
of Messenger RNA in Mouse Tissues," Cell 9:761-774 
(1976) ) , who suggested that 26xl0 6 nucleotides of the 
3xi0 9 base pair rodent genome were expressed in brain, 
23xl0 6 in liver, and 22xl0 6 in kidney, with nearly, 
complete overlap in RNA sets* This indicates a very 
minimal number of tissue-specific mRNAs. However, 
experience has shown that these values must clearly be 
underestimates, because many mRNA molecules, which* were 



probably of abundances below the detection limits of this 
early study, have been shown to be expressed in brain but 
detectable in neither liver nor kidney. Many other 
researchers (J* A. Bantle £ W.E. Hahn, "Complexity and 
Characterization of Polyadenylated UNA in the Mouse 
Brain, » Cell 8:139-130 (1976); D.M. Chilcaraishi , 
"Complexity of Cytoplasmic Polyadenylated and Non- 
Adenylated Rat Brain Ribonucleic Acids , " Biochemistry 
18:3249-3256 (1979)) have measured analog complexities of 
between 100-200x10* nucleotides in brain, and 2-to-3-fold 
lower estimates in liver and kidney. Of the brain mRNAs, 
50-65% are detected in neither liver nor kidney. These 
values have been supported by digital cloning studies 
(R*J. Milner & J.G. Sutcliffe, "Gene Expression in Rat 
Brain," Nucl. Acids Res, 11:5497-5520 (1983)). 

Analog measurements on bulk mRNA suggested that 
the average mRNA length was between 1400-1900 
nucleotides. In a systematic digital analysis of brain 
mRNA length using 200 randomly selected brain cDNAs to 
measure RNA size by northern blotting (Milner & 
Sutcliffe, supra ) , it was found that, when the mRNA size 
data were weighted for RNA prevalence, the average length 
was 1790 nucleotides, the same as that determined by 
analog measurements. However, the mRNAs that made up 
most of the brain mRNA complexity had an average length 
of 5000 nucleotides. Not only were the rarer brain RNAs 
longer, but they tended to be brain specific, while the 
more prevalent brain mRNAs were more ubiquitously 
expressed and were much shorter on average. 

These concepts about mRNA lengths have been 
corroborated more recently from the length of brain mRNA 
whose sequences have been determined (J.G. Sutcliffe, 
"mRNA in the Mammalian Central Nervous System," Annu. 
Rev. Neurosci. 11:157-198 (1988)). Thus, the 1-2x10* 



nucleotide complexity and 5000-rracleotide average mRNA 
length calculates to an estimated 30,000 mRNAs expressed 
in the brain, of which about 2/3 are not detected in 
liver or kidney • Brain apparently accounts for a 
considerable portion of the tissue-specific genes of 
mammals. Most brain mRNAs are expressed at low 
concentration. There are no total-mammal mRNA complexity 
measurements, nor is it yet known whether 5000 
nucleotides is a good mRNA-length estimate for non-neural 
tissues. A reasonable estimate of total gene number 
might be between 50,000 and 100,000. 

What is most needed to advance by a chemical 
understanding of physiological function is a menu of 
protein sequences encoded by the genome plus the cell 
types in which each is expressed. At present, protein 
sequences can be reliably deduced only from cDNAs, not 
from genes, because of the presence of the intervening 
sequences (introns) in the genomic sequences. Even the 
complete nucleotide sequence of a mammalian genome will 
not substitute for characterization of its expressed 
sequences. Therefore, a systematic strategy for 
collecting transcribed sequences and demonstrating their 
sites of expression is needed. Such a strategy would be 
of particular use in determining sequences expressed 
differentially within the brain. It is necessarily an 
eventual goal of such a study to achieve closure; that 
is, to identify all mRNAs. Closure can be difficult to 
obtain due to the differing prevalence of various mRNAs 
and the large number of distinct mRNAs expressed by many 
distinct tissues. The effort to obtain it allows one to 
obtain a progressively more reliable description of the 
dimensions of gene space. 

Studies carried out in the laboratory of Craig 
Venter (M.D. Adams et al. , "Complementary DNA Sequencing: 



Expressed Sequence Tags and Human Genome Project, * 
Science 252:1651-1656 (1991); X.D* Adams et al., 
"Sequence Identification of 2,375 Human Brain Genes , n 
Nature 355:632-634 (1992)) have resulted in the isolation 
of randomly chosen cDNA clones of human brain mRNAs, the 
determination of short single-pass sequences of their 3'- 
ends, about 300 base pairs, and a compilation of some 
2500 of these as a database of "expressed sequence tags." 
This database, while useful, fails to provide any 
knowledge of differential expression. It is therefore 
important to be able to recognize genes based on their 
overall pattern of expression within regions of brain and 
other tissues and in response to various paradigms, such 
as various physiological or pathological states or the 
effects of drug treatment, rather than simply their 
expression in a single tissue. 

Other work has focused on the use of the 
polymerase chain reaction (PCR) to establish a database. 
Williams et al. (J.G.K. Williams et al., "DNA 
Polymorphisms Amplified by Arbitrary Primers Are Useful 
as Genetic Markers," Nucl. Acids Res. 18:6531-6535 
(1990)) and Welsh & McClelland (J. Welsh & McClelland, 
"Genomic Fingerprinting Using Arbitrarily Primed PCR and 
a Matrix of Pairwise Combinations of Primers," Nucl. 
Acids Res. 18:7213-7218 (1990)) showed that single 10-mer 
primers of arbitrarily chosen sequences, i.e., any 10-mer 
primer off the shelf, when used for PCR with complex DNA 
templates such as human, plant, yeast, or bacterial 
genomic DNA, gave rise to an array of PCR products. The 
priming events were demonstrated to involve incomplete 
complementarity between the primer and the template DNA* 
Presumably, partially mismatched primer-binding sites are 
randomly distributed through the genome. Occasionally, 
two of these sites in opposing orientation were located 
closely enough together to give rise to a PCR product 



band. There were on average 8-10 products, which varied 
in size from about 0.4 to about 4 kb and had different 
mobilities for each primer- The array of PCR products 
exhibited differences among individuals of the same 
species. These authors proposed that the single 
arbitrary primers could be used to produce restriction 
fragment length polymorphism (RFLP) -like information for 
genetic studies. Others have applied this technology 
(S.R. Woodward et al., "Random Sequence Oligonucleotide 
Primers Detect Polymorphic DNA Products Which Segregate 
in Inbred Strains of Mice," Mamm. Genome 3:73-78 (1992); 
J.H. Nadeau et al., "Multilocus Markers for Mouse Genome 
Analysis: PCR Amplification Based on Single Primers of 
Arbitrary Nucleotide Sequence/ 1 Mamm. Genome 3:55-64 
(1992) ) . 

Two groups (J. Welsh et al., "Arbitrarily 
Primed PCR Fingerprinting of RNA," Nucl. Acids Res. 
20:4965-4970 (1992); P. Liang & A.B. Pardee, 
"Differential Display of Eukaryotic Messenger RNA by 
Means of the Polymerase Chain Reaction," Science 257:967- 
971 (1992)) adapted the method to compare mRNA 
populations. In the study of Liang and Pardee, this 
method, called mRNA differential display, was used to 
compare the population of mRNAs expressed by two related 
cell types, normal and tumorigenic mouse A31 cells. For 
each experiment, they used one arbitrary 10-mer as the 
5'-primer and an oligonucleotide complementary to a 
subset of poly A tails as a 3' anchor primer, performing 
PCR amplification in the presence of 35 S-dNTPs on cDNAs 
prepared from the two cell types. The products were 
resolved on sequencing gels and 50-10G bands ranging from 
100-500 nucleotides were observed. The bands presumably 
resulted from amplification of cONAs corresponding to the 
3'-ends of mRNAs that contain the complement of the 3' 
anchor primer and a partially mismatched 5' primer site, 



as bad been observed on genomic DNA templates. For each 
primer pair, the pattern of bands amplified from the two 
cDNAs was similar, with the intensities of about 80% of 
the bands being indistinguishable. Some of the bands 
were more intense in one or the other of the PCR samples; 
a few were detected in only one of the two samples. 

Further studies (P. Liang et al., "Distribution 
and Cloning of Eukaryotic mRNAs by Means of Differential 
Display: Refinements and Optimization, " Nucl. Acids Res. 
21:3.269-3275 (1993)) have demonstrated that the procedure 
works with low concentrations of input RNA (although it 
is not quantitative for rarer species) , and the 
specificity resides primarily in the last nucleotide of 
the 3' anchor primer. At least a third of identified 
differentially detected PCR products correspond to 
differentially expressed RNAs, with a false positive rate 
of at least 25%. 

If all of the 50,000 to 100,000 mRNAs of the 
mammal were accessible to this arbitrary-primer PCR 
approach, then about 80-95 5' arbitrary primers and 12 3' 
anchor primers would be required in about 1000 PCR panels 
and gels to give a likelihood, calculated by the Poisson 
distribution, that about two-thirds of these mRNAs would 
be identified. 

It is unlikely that all mRNAs are amenable to 
detection by this method for the following reasons . For 
an mRNA to surface in such a survey, it must be prevalent 
enough to produce a signal on the autoradiograph and 
contain a sequence in its 3' 500 nucleotides capable of 
serving as a site for mismatched primer binding and 
priming. The more prevalent an individual mRNA species, 
the more likely it would be to generate a product* Thus, 
prevalent species may give bands with many different 



arbitrary primers • Because this latter property would 
contain an unpredictable element of chance based on 
selection of the arbitrary primers , it would be difficult 
to approach closure by the arbitrary primer method. 
Also, for the information to be portable from one 
laboratory to another and reliable , the mismatched 
priming must be highly reproducible under different 
laboratory conditions using different PGR machines , with 
he resulting slight variation in reaction conditions. As 
the basis for mismatched priming is poorly understood, 
this is a drawback of building a database from data 
obtained by the Liang & Pardee differential display 
method. 

There is therefore a need for an improved 
method of differential display of mRNA species that 
reduces the uncertain aspect of 5 ' -end generation and 
allows data to be absolutely reproducible in different 
settings. Preferably, such a method does not depend on 
potentially irreproducible mismatched priming* 
Preferably, such a method reduces the number of PCR 
panels and gels required for a complete survey and allows 
double-strand sequence data to be rapidly accumulated. 
Preferably, such an improved method also reduces, if not 
eliminates, the numtoer of concurrent signals obtained 
from the same species of mRNA. 



SUMMARY 

We have developed an improved method for the 
simultaneous sequence-specific identification of mRNAs in 
a mRNA population. In general , this method comprises: 

(1) preparing double-stranded cDNAs from a mRNA 
population using a mixture of 12 anchor primers, the 
apchor primers each including: (i) a tract of from 7 to 



10178 



40 T residues; (ii) a site for cleavage by a restriction 
endonuclease that recognizes more than six bases, the 
site for cleavage being located to the 5'-side of the 
tract of T residues; (iii) a stuff er segment of from 4 to 
5 40 nucleotides, the stuff er segment being located to the 

5 '-side of the site for cleavage by the restriction 
endonuclease; and (iv) phasing residues -V-N located at 
the 3' end of each of the anchor primers, wherein V is a 
deoxyribonucleotide selected from the group consisting of 
10 A, C, and G; and N is a deoxyribonucleotide selected from 

the group consisting of A, C, G, and T, the mixture 
including anchor primers containing all possibilities for 
V and N; 

0 (2) producing cloned inserts from a suitable 
15 '% host cell that has been transformed by a vector, the 

'■gas 

01 vector having the cDNA sample that has been cleaved with 
]*j a first restriction endonuclease and a second restriction 
gi endonuclease inserted therein, the cleaved cDNA sample 

^ being inserted in the vector in an orientation that is 

20 J**. antisense with respect to a bacteriophage-specif ic 

C 5 promoter within the vector, the first restriction 

endonuclease recognizing a f our-nuc lea tide sequence and 
p the second restriction endonuclease cleaving at a single 

^ site within each member of the mixture of anchor primers; 

^5 (3) generating linearized fragments of the 

cloned inserts by digestion with at least one restriction 
endonuclease that is different from the first and second 
restriction endonucleases ; 

(4) generating a cRNA preparation of antisense 
30 cRNA transcripts by incubation of the linearized 

fragments with a bacteriophage-specif ic RNA polymerase 
capable of initiating transcription from the 
bacteriophage-specif ic promoter; 

(5) dividing the cRNA preparation into sixteen 
35 subpools and transcribing first-strand cDNA from each 

subpool, using a thermostable reverse transcriptase and 
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one of sixteen primers whose 3 '-terminus is -N-N, wherein 
N is one of the four deoxyribonucleotides A, C, G, or T, 
the primer being at least 15 nucleotides in length, 
corresponding in sequence to the 3 '-end of the 
bacteriophage-specif ic promoter, and extending across 
into at least the first two nucleotides of the cRNA, the 
mixture including all possibilities for the 3 '-terminal 
two nucleotides; 

(6) using the product of transcription in each 
of the sixteen subpools as a template for a polymerase 
chain reaction with a 3 '-primer that corresponds in 
sequence to a sequence in the vector adjoining the site 
of insertion of the cDNA sample in the vector and a 5'- 
primer selected from the group consisting of: (i) the 
primer from which first-strand cDNA was made for that 
subpool; (ii) the primer from which the first-strand cDNA 
was made for that subpool extended at its 3 '-terminus by 
an additional residue -N, where N can be any of A, C, G, 
or T; and (iii) the primer used for the synthesis of 
first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N can 
be any of A, C, G, or T, to produce polymerase chain 
reaction amplified fragments; and 

(7) resolving the polymerase chain reaction 
amplified fragments by electrophoresis to display bands 
representing the 3'-ends of mRNAs present in the sample. 

Typically, the anchor primers each have 13 T 
residues in the tract of T residues, and the stuff er 
segment of the anchor primers is 14 residues in length. 
A suitable sequence for the stuffer segment is A-A-C-T-G- 
G-A-A-G-A-A-T-T-C (SEQ ID NO: 1) . 

Typically, the site for cleavage by a 
restriction endonuclease that recognizes more than six 
bases is the Wotl cleavage site. In this case, suitable 
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anchor primers have the sequence A-A-C-T-G-G-A-A-G-A-A-T- 
T-CH3-CH3^-C-C-G-C^AH3-^-A--A-T-T--T-T-T-T--T-T-T^T^T- j r--T-- 

T-T-T-T-T-V-N {SEQ ID NO: 2) • 

Typically , the bacteriophage-specific promoter 
is selected from the group consisting of T3 promoter and 
T7 promoter. Most typically, it is the T3 promoter. 

Typically, the sixteen primers for priming of 
transcription of cDNA from cRNA have the sequence A-G-G- 
T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3} . 

The vector can be the plasmid pBC SK* cleaved 
with Clal and Not I, in which case the 3 '-primer in step 
(6) can be G-A-A-C-A-A-A-AHS-C-T-G-G-A-G-C-T-C-C-A-C-C-G- 
C (SEQ ID NO: 4} - 

The first restriction endonuclease recognizing 
a f our-nucleotide sequence is typically Msp l ; 
alternatively, it can be Taa l or HinPlI. The restriction 
endonuclease cleaving at a single site in each of the 
mixture of anchor primers is typically Not I. 

Typically, the mRNA population has been 
enriched for polyadenylated mRNA species. 

A typical host cell is a strain of Escherichia 

coli . 

The step of generating linearized fragments of 
the cloned inserts typically comprises: 

(a) dividing the plasmid containing the 
insert into two fractions, a first fraction cleaved with 
the restriction endonuclease Xho X and a second fraction 
cleaved with the restriction endonuclease Sail ; 
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(b) recombining the first and second 
fractions after cleavage; 

(c) dividing the recombined fractions into 
thirds and cleaving the first third with the restriction 
endonuclease Hind XII, the second third with the 
restriction endonuclease BamH I, and the third third with 
the restriction endonuclease EcoR I ; and 

(d) recombining the thirds after digestion 
in order to produce a population of linearized fragments 
of which about one-sixth of the population corresponds to 
the product of cleavage by each of the possible 
combinations of enzymes* 

Typically, the step of resolving the polymerase 
chain reaction amplified fragments by electrophoresis 
comprises electrophoresis of the fragments on at least 
two gels. 

The method can further comprise determining the 
sequence of the 3'-end of at least one of the mRNAs, such 
as by: 

(1) eluting at least one cDNA corresponding to 
a mRNA from an electropherogram in which bands 
representing the 3 '-ends of mRNAs present in the sample 
are displayed; 

(2) amplifying the eluted cDNA in a polymerase 
chain reaction; 

(3) cloning the amplified cDNA into a plasmid; 

(4) producing DNA corresponding to the cloned 
DNA from the plasmid; and 

(5) sequencing the cloned cDNA. 

Another aspect of the invention is a method of 
simultaneous sequence-specific identification of mRNAs 
corresponding to members of an antisense cRNA pool 
representing the 3 '-ends of a population of 'mRNAs, the 
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antisense cRNAs that are members of the antisense cRNA 
pool being terminated at their 5 '-end with a primer 
sequence corresponding to a bacteriophage-specif ic vector 
and at their 3 '-end with a sequence corresponding in 
sequence to a sequence of the vector. The method 
comprises ; 

(1) dividing the members of the antisense cRNA 
pool into sixteen subpools and transcribing first-strand 
cDNA from each subpool, using a thermostable reverse 
transcriptase and one of sixteen primers whose 3'- 
terminus is -N-N, wherein N is one of the four 
deoxyribonucleotides A, C, G, or T, the primer being at 
least 15 nucleotides in length, corresponding in sequence 
to the 3'-end of the bacteriophage-specif ic promoter, and 
extending across into at least the first two nucleotides 
of the cRNA, the mixture including all possibilities for 
the 3 '-terminal two nucleotides; 

(2) using the product of transcription in each 
of the sixteen subpools as a template for a polymerase 
chain reaction with a 3 '-primer that corresponds in 
sequence to a sequence vector adjoining the site of 
insertion of the cDNA sample in the vector and a 5'- 
primer selected from the group consisting of: (i) the 
primer from which first-strand cDNA was made for that 
subpool; (ii) the primer from which the first-strand cDNA 
was made for that subpool extended at its 3 '-terminus by 
an additional residue -tf, where N can be any of A, C, G, 
or T; and (iii) the primer used for the synthesis of 
first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N* can 
be any of A, C, G, or T, to produce polymerase chain 
reaction amplified fragments; and 

(3) resolving the polymerase chain reaction 
amplified fragments by electrophoresis to display bands 
representing the 3 7 -ends of mRNAs present in the sample. 
* / 
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Yet another aspect of the present invention is 
a method for detecting a change in the pattern of mRNA 
expression in a tissue associated with a physiological or 
pathological change. This method comprises the steps of: 

(1) obtaining a first sample of a tissue that 
is not subject to the physiological or pathological 
change; 

(2) determining the pattern of mRNA expression 
in the first sample of the tissue by performing steps 
(l)-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool representing the 3'- 
ends of a population of mRNAs to generate a first display 
of bands representing the 3 '-ends of mRNAs present in the 
first sample; 

(3) obtaining a second sample of the tissue 
that has been subject to the physiological or 
pathological change; 

(4) determining the pattern of mRNA expression 
in the second sample of the tissue by performing steps 
(l)-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool to generate a second 
display of bands representing the 3 7 -ends of mRNAs 
present in the second sample; and 

(5) comparing the first and second displays to 
determine the effect of the physiological or pathological 
change on the pattern of mRNA expression in the tissue* 

The comparison is typically made in adjacent 

lanes . 

The tissue can be derived from the central 
nervous system or from particular structures within the 
central nervous system. The tissue can alternatively be 
derived from another organ or organ system. 
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Another aspect of the present invention is a 
method of screening for a side effect of a drug. The 
method can comprise the steps of: 

(1) obtaining a first sample of tissue from an 
organism treated with a compound of known physiological 
function; 

(2) determining the pattern of mRNA expression, 
in the first sample of the tissue by performing steps 
(l}-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool to generate a first 
display of bands representing the 3 '-ends of mRNAs 
present in the first sample; 

(3) obtaining a second sample of tissue from 
an organism treated with a drug to be screened for a side 
effect; 

(4) determining the pattern of mRNA expression 
in the second sample of the tissue by performing steps 
(l)-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool to generate a second 
display of bands representing the 3 '-ends of mRNAs 
present in the second sample; and 

(5) comparing the first and second displays in 
order to detect the presence of mRNA species whose 
expression is not affected by the known compound but is 
affected by the drug to be screened, thereby indicating a 
difference in action of the drug to be screened and the 
known compound and thus a side effect. 

The drug to be screened can be a drug affecting 
the central nervous system , such as an antidepressant, a 
neuroleptic, a tranquilizer, an anticonvulsant, a 
monoamine oxidase inhibitor, or a stimulant. 
Alternatively, the drug can be another class of drug such 
as an anti-parkinsonism agent, a skeletal muscle 
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relaxant, an analgesic, a local anesthetic, a 
cholinergic, an antispasmodic, a steroid, or a non- 
steroidal anti-inflammatory drug. 

Another aspect of the present invention is 
panels of primers and degenerate mixtures of primers 
suitable for the practice of the present invention. 
These include: 

(1) a panel of primers comprising 15 primers of 
the sequence AHS-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID 
NO: 3) , wherein N is one of the four deoxyribonucleotides 
A, C, G, or T; 

(2) a panel of primers comprising 64 primers of 
the sequences A-G-G-T-C-G-A-C-GH3-T-A-T-C-G-G-N-N-N (SEQ 
ID NO; 5) , wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; 

(3) a panel of primers comprising 256 primers 
of the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N 
(SEQ ID NO: 6) , wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; and 

(4) a panel of primers comprising 12 primers 
of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C- 
G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N 
(SEQ ID NO: 2) , wherein V is a deoxyribonucleotide 
selected from the group consisting of A, c, and G; and N 
is a deoxyribonucleotide selected from the group 
consisting of A, C, G, and T; and 

(5) a degenerate mixture of primers comprising 
a mixture of 12 primers of the sequences A-A-C-T-G-G-A-A- 
G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T^T-T-T- 
T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting Of A f C, G, and T, each of the 12 
primers being present in about an equimolar quantity. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



These and other features, aspects, and 
advantages of the present invention will become better 
understood with reference to the following description, 
appended claims, and accompanying drawings where: 

Figure 1 is a diagrammatic depiction of the 
method of the present invention showing the various 
stages of priming, cleavage, cloning and amplification; 
and 

Figure 2 is am autoradiogram of a gel showing 
the result of performing the method of the present 
invention using several 5'-primers in the PCR step 
corresponding to lenown sequences of brain mRNAs and using 
liver and brain mRNA as starting material, 

DESCRIPTION 

We have developed a method for simultaneous 
sequence-specific identification and display of mRNAs in 
a mRNA population. 

As discussed below, this method has a number of 
applications in drug screening, the study of 
physiological and pathological conditions, and genomic 
mapping. These applications will be discussed below ^ 

I* SIMULTANEOUS SEQUENCE-SPECIFIC IDENTIFICATION OF 
mRNAs 

A method according to the present invention, 
based on the polymerase chain reaction (PCR) technique, 
provides means for visualization of nearly every mRNA 
expressed by a tissue as a distinct band on a gel whose 
intensity corresponds roughly to the concentration of the 
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inRNA. The method is based on the observation that 
virtually all mRNAs conclude with a 3 '-poly (A) tail but 
does not rely on the specificity of primer binding to the 
tail. 

In general, the method comprises: 

(1) preparing double-stranded cDNAs from a mRNA 
population using a mixture of 12 anchor primers, the 
anchor primers each including: (i) a tract of from 7 to 
40 T residues; (ii) a site for cleavage by a restriction 
endonuclease that recognizes more than six bases, the 
site for cleavage being located to the S'-side of the 
tract of T residues; (iii) a stuff er segment of from 4 to 
40 nucleotides, the stuff er segment being located to the 
S'-side of the site for cleavage by the restriction 
endonuclease; and (iv) phasing residues -V-N located at 
the 3' end of each of the anchor primers, wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T, the mixture 
including anchor primers containing all possibilities for 
V and N; 

(2) producing cloned inserts from a suitable 
host cell that has been transformed by a vector, the 
vector having the cDNA sample that has been cleaved with 
a first restriction endonuclease and a second restriction 
endonuclease inserted therein, the cleaved cDNA sample 
being inserted in the vector in an orientation that is 
antisense with respect to a bacteriophage-specif ic 
promoter within the vector, the first restriction 
endonuclease recognizing a f our-nucleotide sequence and 
the second restriction endonuclease cleaving at a single 
site within each member of the mixture of anchor primers; 

(3) generating linearized fragments of the 
cloned inserts by digestion with, at least one restriction 
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endonuc lease that is different from the first and second 
restriction endonucleases; 

(4) generating a cRNA preparation of antisense 
cRNA transcripts by incubation of the linearized 
fragments with a bacteriophage-specific UNA polymerase 
capable of initiating transcription from the 
bacteriophage-specific promoter; 

(5) dividing the cRNA preparation into sixteen 
subpools and transcribing first-strand cDNA from each 
subpool, using a thermostable reverse transcriptase and 
one of sixteen primers whose 3 '-terminus is -N-N, wherein 
N is one of the four deoxyribonucleo tides A, C, G, or T, 
the primer being at least 15 nucleotides in length, 
corresponding in sequence to the 3 '-end of the 
bacteriophage-specific promoter, and extending across 
into at least the first two nucleotides of the cRNA, the 
mixture including all possibilities for the 3' -terminal 
two nucleotides; 

(6) using the product of transcription in each 
of the sixteen subpools as a template for a polymerase 
chain reaction with a 3 '-primer that corresponds in 
sequence to a sequence in the vector adjoining the site 
of insertion of the cDNA sample in the vector and a 5'- 
primer selected from the group consisting of: (i) the 
primer from which first-strand cDNA was made for that 
subpool; (ii) the primer from which the first-strand cDNA 
was made for that subpool extended at its 3 '-terminus by 
an additional residue -N, where N can be any of A, c, G, 
or T; and (iii) the primer used for the synthesis of 
first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N can 
be any of A, C, G, or T, to produce polymerase chain 
reaction amplified fragments; and 

(7) resolving the polymerase chain reaction 
amplified fragments by electrophoresis to display bands 
representing the 3' -ends of mRNAs present in the sample. 
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1. 



A depiction of this scheme is shown in Figure 



A. Isolation of mRNA 

The first step in the method is isolation or 
provision of a mRNA population- Methods of extraction of 
RNA are well-known in the art and are described, for 
example, in J. Sambrook et al., "Molecular Cloning: A 
Laboratory Manual" fCold Spring Harbor Laboratory Press, 
Cold Spring Harbor, New York, 1989), vol. 1, ch. 7, 
"Extraction, Purification, and Analysis of Messenger RNA 
from Eukaryotic Cells," incorporated herein by this 
reference* Other isolation and extraction methods are 
also well-known* Typically, isolation is performed in 
the presence of chaotropic agents such as guanidinium 
chloride or guanidinium thiocyanate, although other 
detergents and extraction agents can alternatively be 
used. 

Typically, the mRNA is isolated from the total 
extracted RNA by chromatography over oligo (dT) -cellulose 
or other chromatographic media that have the capacity to 
bind the polyadenylated 3'-portion of mRNA molecules. 
Alternatively, but less preferably, total RNA can be 
used. However, it is generally preferred to isolate 
poly(A) + RNA. 

B. Preparation of Double-Stranded cDNA 

Double-stranded cDNAs are then prepared from 
the mRNA population using a mixture of twelve anchor 
primers to initiate reverse transcription. The anchor 
primers each include: (i) a tract of from 7 to 40 T 
residues; (ii) a site for cleavage by a restriction 
endonuclease that recognizes more than six bases, the 
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site for cleavage being located to the S'-side of the 
tract of T residues; (iii) a stuff er segment of from 4 to 
40 nucleotides , the stuff er segment being located to the 
S'-side of the site for cleavage by the restriction 
endonuclease; and (iv) phasing residues -V-N located at 
the 3' end of each of the anchor primers, wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T* The mixture 
includes anchor primers containing all possibilities for 
V and N. 

Typically, the anchor primers each have 18 T 
residues in the tract of T residues, and the stuff er 
segment of the anchor primers is 14 residues in length. 
A suitable sequence of the stuffer segment is A-A-C-T-G- 
G-A-A-G-A-A-T-T-C (SEQ ID NO: 1) . Typically, the site 
for cleavage by a restriction endonuclease that 
recognizes more than six bases is the Not I cleavage site. 
A preferred set of anchor primers has the sequence A-A-C- 
T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T- 
T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N ( SEQ xd NO: 2) . 

One member of this mixture of twelve anchor 
primers initiates synthesis at a fixed position at the 
3 '-end of all copies of each mRNA species in the sample, 
thereby defining a 3 '-end point for each species. 

This reaction is carried out under conditions 
for the preparation of double-stranded cDNA from mRNA 
that are well-known in the art. Such techniques are 
described, for example, in Volume 2 of J. Sambrook et 
al., "Molecular Cloning: A Laboratory Manual", entitled 
ff Construction and Analysis of cDNA Libraries." 
Typically, reverse transcriptase from avian , 
myeloblastosis virus is used. i 
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C. Cleavage of the cDNA Sample With Restriction 
Endomic leases 



The cDNA sample is cleaved with two restriction 
endonucleases. The first restriction endonuclease is an 
endonuclease that recognizes a 4-nucleotide sequence. 
This typically cleaves at multiple sites in most cDNAs* 
The second restriction endonuclease cleaves at a single 
site within each member of the mixture of anchor primers* 
Typically, the first restriction endonuclease is Msp l and 
the second restriction endonuclease is Not I. The enzyme 
Not does not cleave within most cDNAs • This is desirable 
to minimize the loss of cloned inserts that would result 
from cleavage of the cDNAs at locations other than in the 
anchor site. 

Alternatively, the first restriction 
endonuclease can be TagI or Hin FlI. The use of the 
latter two restriction endonucleases can detect rare 
mRNAs that are not cleaved by Msp I. The first 
restriction endonuclease generates a 5 '-overhang 
compatible for cloning into the desired vector, as 
discussed below. This cloning, for the pBC SK* vector, 
is into the Clal site, as discussed below. 

Conditions for digestion of the cDNA are well- 
known in the art and are described, for example, in J. 
Sambrook et al., "Molecular Cloning: A Laboratory 
Manual," Vol. 1, ch. 5, "Enzymes Used in Molecular 
Cloning. " 

D. Insertion of Cleaved cDNA into a Vector 

The cDNA sample cleaved with the first and 
second restriction endonucleases is then inserted into a 
vector*. A suitable vector is the plasmid pBC SK* that 
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has been cleaved with the restriction endonucleases Clar 
and Not l. The vector contains a bacteriophage-specif ic 
promoter* Typically, the promoter is a T3 promoter or a 
T7 promoter. A preferred promoter is bacteriophage T3 
promoter. The cleaved cDNA is inserted into the promoter 
in an orientation that is antisense with respect to the 
bacteriophage-specific promoter. 

E. Transformation of a Suitable Host Cell 

The vector into which the cleaved DNA has been 
inserted is then used to transform a suitable host cell 
that can be efficiently transformed or transfected by the 
vector containing the insert. Suitable host cells for 
cloning are described, for example , in Sambrook et al. , 
"Molecular Cloning: A Laboratory Manual, 11 supra . 
Typically, the host cell is prokaryotic. A particularly 
suitable host cell is a strain of E. coli . A suitable 
coli strain is MC1061. Preferably, a small aliquot is 
also used to transform E. coli strain XLl-Blue so that 
the percentage of clones with inserts is determined from 
the relative percentages of blue and white colonies on X- 
gal plates. Only libraries with in excess of 5xl0 5 
.recombinants are typically acceptable. 

F. Generation of Linearized Fragments 

Plasmid preparations, typically as minipreps, 
are then made from each of the cDNA libraries. 
Linearized fragments are then generated by digestion with 
at least one restriction endonuclease that is different 
from the first and second restriction endonucleases 
discussed above. Preferably, an aliquot of each of the 
cloned inserts is divided into two pools, one of which is 
cleaved with Xho l and the second with Sail. The/ pools of 
linearized plasmids are combined, mixed, then divided 
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into thirds. The thirds are digested with Hind lll, 
BamH I, and gcy oRT- This procedure is followed because, in 
order to generate antisense transcripts of the inserts 
with T3 RNA polymerase, the template must first be 
cleaved with a restriction endonuclease that cuts within 
flanking sequences but not within the inserts themselves. 
Given that the average length of the 3 '-terminal Msp X 
fragments is. 256 base pairs, approximately 6% of the 
inserts contain sites for any enzyme with a hexamer 
recognition sequence. Those inserts would be lost to 
further analysis were only a single enzyme utilized. 
Hence, it is preferable to divide the reaction so that 
only one of either of two enzymes is used for 
linearization of each half reaction. Only inserts 
containing sites for both enzymes (approximately 0.4%) 
are lost from both halves of the samples. Similarly, 
each cRNA sample is contaminated to a different extent 
with transcripts from insertless plasmids, which could 
lead to variability in the efficiency of the later 
polymerase chain reactions for different samples because 
of differential competition for primers. Cleavage of 
thirds of the samples with one of three enzymes that have 
single targets in pBC SK + between its Clal and NotI sites 
eliminates the production of transcripts containing 
binding sites for the eventual 5' primers in the PGR 
process from insertless plasmids. The use of three 
enzymes on thirds of the reaction reduces the use of 
insert-containing sequences that also contain sites for 
the enzyme while solving the problem of possible 
contamination of insertless sequences. If only one 
enzyme were used, about 10% of the insert-containing 
sequences would be lost, but this is reduced to about 
0.1%, because only those sequences that fail to be 
cleaved by all three enzymes are lost. 
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The next step is a generation of a cRNA 
preparation of antisense cRNA transcripts. This is 
performed by incubation of the linearized fragments with 
an HNA polymerase capable of initiating transcription 
from the bacteriophage-specif ic promoter. Typically, as 
discussed above, the promoter is a T3 promoter, and the 
polymerase is therefore T3 RNA polymerase- The 
polymerase is incubated with the linearized fragments and 
the four ribonucleoside triphosphates under conditions 
suitable for synthesis. 

H * Transcription of First-Strand cDNA 

The cRNA preparation is then divided into 
sixteen subpools. First-strand cDNA is then transcribed 
from each subpool, using a thermostable reverse 
transcriptase and a primer as described below. A 
preferred transcriptase is the recombinant reverse 
transcriptase from Thermus thermoohilus , known as rTth , 
available from Perkin-Elmer (Norwalk, CT) • This enzyme 
is also known as an RNA -dependent DNA polymerase. With 
this reverse transcriptase , annealing is performed at 
60 °C, and the transcription reaction at 70 °C. This 
promotes high fidelity complementarity between the primer 
and the cRNA. The primer used is one of the sixteen 
primers whose 3 '-terminus is -N-N, wherein N is one of 
the four deoxyribonucleo tides A, C, G, or T, the primer 
being at least 15 nucleotides in length, corresponding in 
sequence to the 3 '-end of the bacteriophage-specif ic 
promoter, and extending across into at least the first 
two nucleotides of the cRNA. 
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Where tlie bacteriophage-specif ic promoter is 
the T3 promoter, the primers typically have the sequence 
A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3). 

I. PCR Reaction 

The next step is the use of the product of 
transcription in each of the sixteen subpools as a 
template for a polymerase chain reaction with primers as 
described below to produce polymerase chain reaction 
amplified fragments. 

The primers used are; (a) a 3 '-primer that 
corresponds in sequence to a sequence in the vector 
adjoining the site of insertion of the cDNA sample in the 
vector; and (b) a 5 '-primer selected from the group 
consisting of: (i) the primer from which first-strand 
cDNA was made for that subpool; (ii) the primer from 
which the first-strand cDNA was made for that subpool 
extended at its 3 '-terminus by an additional residue -N, 
where N can be any of A, C, G, or T; and (iii) the primer 
used for the synthesis of first-strand cDNA for that 
subpool extended at its 3 '-terminus by two additional 
residues -N-N, wherein N can be any of A, C, G, or T. 

When the vector is the plasmid pBC SK* cleaved 
with Clal and Not I, a suitable 3 '-primer is G-A-A-C-A-A- 
A-A-G-C-T-G-G-A-G-C-T-C-C-A-C-C-G-C (SEQ ID NO: 4) . 
Where the bacteriophage~specif ic promoter is the T3 
promoter , suitable 5 '-primers have the sequences A-G-G-T- 
C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3) , A-G-G-T-C-G- 
A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 5), or A-G-G-T-C-G- 
A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6) . 

Typically, PCR is performed in the presence of 
35 S-dATP using a PCR program of IS seconds at 94 °C for 
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denaturation, 15 seconds at 60°C for annealing, and 30 
seconds at 72 °C for synthesis on a Perkin-Elmer 9600 
apparatus (Perkin-Elmer Cetus, Norwalk, CT) . The high 
temperature annealing step minimizes art if actual 
mispriming hy the 5'-primer at its 3 '-end and promotes 
high fidelity copying* 

Alternatively , the PCR amplification can be 
carried out in the presence of a ^P-labeled 
deoxyribonucleoside triphosphate, such as [^PjdCTP* 
However, it is generally preferred to use a 35 S-labeled 
deoxyribonucleoside triphosphate for maximum resolution* 
Other detection methods, including nonradioactive labels, 
can also be used. 

These series of reactions produces 15, 64, and 
256 product pools for the three sets of 5'-primers. It 
produces 16 product pools for the primer that is the same 
as the primer from which first-strand cDNA was made. It 
produces 64 product pools for the primer extended at its 
3 '-terminus by an additional residue N, where N can be 
any of the four nucleotides. It produces 256 products 
for the primer extended at its 3 '-terminus by two 
additional residues -N-N, where N again can be any of the 
four nucleotides. 

The process of the present invention can be 
extended by using longer sets of 5 '-primers extended at 
their 3 '-end by additional nucleotides* For example, a 
primer with the 3 '-terminus ~N-N-tf-tf-N would give 1024 
products . 

J. Electrophoresis 

The polymerase chain reaction amplified 
frqigments are then resolved by electrophoresis to display 
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bands representing the 3 '-ends of mRNAs present in the 
sample. 



Electrophoretic techniques for resolving PCR 
amplified fragments are well-understood in the art and 
need not be further recited here. The corresponding 
products are resolved in denaturing DNA sequencing gels 
and visualized by autoradiography. For the particular 
vector system described herein, the gels are run so that 
the first 140 base pairs run off their bottom, since 
vector-related sequences increase the length of the cDNAs 
by 140 base pairs. This number can vary if other vector 
systems are employed, and the appropriate electrophoresis 
conditions so that vector-related sequences run off the 
bottom of the gels can be determined from a consideration 
of the sequences of the vector involved. Typically, each 
reaction is run on a separate denaturing gel, so that at 
least two gels are used. It is preferred to perform a 
series of reactions in parallel, such as from different 
tissues, and resolve all of the reactions using the same 
primer on the same gel. A substantial number of 
reactions can be resolved on the same gel. Typically, as 
many as thirty reactions can be resolved on the same gel 
and compared. As discussed below, this provides a way of 
determining tissue-specific mRNAs. 

Typically, autoradiography is used to detect 
the resolved cDNA species. However, other detection 
methods, such as phosphorimaging or fluorescence, can 
also be used, and may provide higher sensitivity in 
certain applications. 



According to the scheme, the cDNA libraries 
produced from each of the mRNA samples contain copies of 
the extreme 3 '-ends from the most distal site for Msp I to 
the beginning of the poly (A) tail of all poly (A) * mRNAs 
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in the starting RNA sample approximately according to the 
initial relative concentrations of the mRNAs. Because 
both ends of the inserts for each species are exactly 
defined by sequence, their lengths sure uniform for each 
species allowing their later visualization as discrete 
bands on a gel, regardless of the tissue source of the 
mRNA. 

The use of successive steps with lengthening 
primers to survey the cDtfAs essentially act like a nested 
PCR. These steps enhance quality control and diminish 
the background that potentially could result from 
amplification of untargeted cDNAs • In a preferred 
embodiment, the second reverse transcription step 
subdivides each cRNA sample into sixteen subpools, 
utilizing a primer that anneals to the sequences derived 
from pBC SK* but extends across the CGG of the non- 
regenerated Msp l site and including two nucleotides (-N- 
N) of the insert. This step segregates the starting 
population of potentially 50,000 to 100,000 mHNAs into 
sixteen subpools of approximately 3,000 to 6,000 members 
each. In serial iterations of the subsequent PCR step, 
in which radioactive label is incorporated into the 
products for their autoradiographic visualization, those 
pools are further segregated by division into four or 
sixteen subsubpools by using progressively longer 5'- 
primers containing three or four nucleotides of the 
insert. 

By first demanding by high temperature 
annealing a high fidelity 3 '-end match at the reverse 
transcription step in the -N-N positions, and 
subsequently demanding again such high fidelity matching 
into -N-N-N or -N-NHtf-tf iterations, bleedthrough from 
mismatched priming at the -N-N positions is drastically 
minimized. * 
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The steps of the process beginning with 
dividing the cHNA preparation into sixteen subpools and 
transcribing first-strand cDNA from each subpool can be 
performed separately as a method of simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool representing the 3'- 
ends of a population of mRNAs. 



II- APPLICATIONS OF THE METHOD FOR DISFIAY OF mRNA 
PATTERNS 

The method described above for the detection of 
patterns of mRNA expression in a tissue and the resolving 
of these patterns by gel electrophoresis has a number of 
applications. One of these applications is its use for 
the detection of a change in the pattern of mRNA 
expression in a tissue associated with a physiological or 
pathological change. In general, this method comprises: 

(1) obtaining a first sample of a tissue that 
is not subject to the physiological or pathological 
change; 

(2) determining the pattern of mRNA expression 
in the first sample of the tissue by performing the 
method of simultaneous sequence-specific identification 
of mRNAs corresponding to members of an antisense cRNA 
pool representing the 3 '-ends of a population of mRNAs as 
described above to generate a first display of bands 
representing the 3 '-ends of mRNAs present in the first 
sample; 

(3) obtaining a second sample of the tissue 
that has been subject to the physiological or 
pathological change; 

(4) determining the pattern of mRNA expression 
in the second sample of the tissue by performing the 
method of simultaneous secfuence-specif ic identification 
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of bSNAs corresponding to members of an antisense cRNA 
pool representing the 3 r -ends of a population of mRNAs as 
described above to generate a second display of bands 
representing the 3' -ends of mRNAs present in the second 
sample; and 

(5) comparing the first and second displays to 
determine the effect of the physiological or pathological 
change on the pattern of mRNA expression in the tissue. 

Typically, the comparison is made in adjacent 
lanes of a single gel. 

The tissue can be derived from the central 
nervous system. In particular, it can be derived from a 
structure within the central nervous system that is the 
retina, cerebral cortex, olfactory bulb, thalamus, 
hypothalamus, anterior pituitary, posterior pituitary, 
hippocampus, nucleus accumbens, amygdala, striatum, 
cerebellum, brain stem, suprachiasmatic nucleus, or 
spinal cord. When the tissue is derived from the central 
nervous system, the physiological or pathological change 
can be any of Alzheimer's disease, parkinsonism, 
ischemia, alcohol addiction, drug addiction, 
schizophrenia, amyotrophic lateral sclerosis, multiple 
sclerosis, depression, and bipolar manic-depressive 
disorder. Alternatively, the method of the present 
invention can be used to study circadian variation, 
aging, or long-term potentiation, the latter affecting 
the hippocampus. Additionally, particularly with 
reference to mRNA species occurring in particular 
structures within the central nervous system, the method 
can be used to study brain regions that are known to be 
involved in complex behaviors, such as learning and 
memory, emotion, drug addiction, glutamate neurotoxicity, 
feeding behavior, olfaction, viral infection, vision, and 
movement disorders. 



This method can also be used to study the 
results of the administration of drugs and/ or toxins to 
an individual by comparing the mRNA pattern of a tissue 
before and after the administration of the drug or toxin ♦ 
Results of electroshock therapy can also be studied. 

Alternatively, the tissue can be from an organ 
or organ system that includes the cardiovascular system, 
the pulmonary system, the digestive system, the 
peripheral nervous system, the liver, the kidney, 
skeletal muscle, and the reproductive system, or from any 
other organ or organ system of the body. For example, 
mRNA patterns can be studied from liver, heart, kidney, 
or skeletal muscle. Additionally, for any tissue, 
samples can be taken at various times so as to discover a 
circadian effect of mRNA expression. Thus, this method 
can ascribe particular mRNA species to involvement in 
particular patterns of function or malfunction. 

The antisense cRNA pool representing the 3'- 
ends of mRNAs can be generated by steps (i)-(4) of the 
method as described above in Section I. 

Similarly, the mRNA resolution method of the 
present invention can be used as part of a method of 
screening for a side effect of a drug. In general, such 
a method comprises : 

(1) obtaining a first sample of tissue from an 
organism treated with a compound of known physiological 
function; 

(2) determining the pattern of mRNA expression 
in the first sample of the tissue by performing the 
method of simultaneous sequence-specific identification 
of mRNAs corresponding to members of an antisense cRNA 
pool representing the 3'-ends of a population of mRNAs, 
as described above ^ to generate a first display of bands 



representing the 3 '-ends of mRNAs present in the first 
sample; 

(3} obtaining a second sample of tissue from 
an organism treated with a drug to be screened for a side 
effect; 

(4) determining the pattern of mRNA expression 
in the second sample of the tissue by performing the 
method of simultaneous sequence-specific identification 
of mRNAs corresponding to members of an antisense cRNA 
pool representing the 3' -ends of a population of mRNAs, 
as described above, to generate a second display of bands 
representing the 3 '-ends of mRNAs present in the second 
sample; and 

(5) comparing the first and second displays in 
order to detect the presence of mRNA species whose 
expression is not affected by the known compound but is 
affected by the drug to be screened, thereby indicating a 
difference in action of the drug to be screened and the 
known compound and thus a side effect. 

In particular , this method can be used for 
drugs affecting the central nervous system, such as 
antidepressants , neuroleptics , tranquilizers , 
anticonvulsants, monoamine oxidase inhibitors, and 
stimulants- However, this method can in fact be used for 
any drug that may affect mRNA expression in a particular 
tissue. For example, the effect on mRNA expression of 
anti-parkinsonism agents, skeletal muscle relaxants, 
analgesics, local anesthetics, cholinergics, 
antispasmodics, steroids, non-steroidal anti-inflammatory 
drugs, antiviral agents, or any other drug capable of 
affecting mRNA expression can be studied, and the effect 
determined in a particular tissue or structure ♦ 

A further application of the method of the 
present invention 4 is in obtaining the sequence of the 3'- 



ends of mRNA species that are displayed. In general, a 
method of obtaining the sequence comprises: 

(1) eluting at least ofie cDNA corresponding to 
a mRNA from an electropherogram in which bands 
representing the 3' -ends of mRNAs present in the sample 
are displayed; 

(2) amplifying the eluted cDNA in a polymerase 
chain reaction; 

(3) cloning the amplified cDNA into a plasmid; 

(4) producing DNA corresponding to the cloned 
DNA from the plasmid; and 

(5) sequencing the cloned cDNA. 

The cDNA that has been excised can be amplified 
with the primers previously used in the PGR step. The 
cDNA can then be cloned into pCR II (Invitrogen, San 
Diego, CA) by TA cloning and ligation into the vector. 
Minipreps of the DNA can then be produced by standard 
techniques from subclones and a portion denatured and 
split into two aliquots for automated sequencing by the 
dideoxy chain termination method of Sanger* A 
commercially available sequencer can be used, such as a 
ABI sequencer, for automated sequencing ♦ This will allow 
the determination of complementary sequences for most 
cDNAs studied, in the length range of 50-500 bp, across 
the entire length of the fragment • 

These partial sequences can then be used to 
scan genomic data bases such as GenBank to recognize 
sequence identities and similarities using programs such 
as BLASTN and BLASTX. Because this method generates 
sequences from only the 3' -ends of mRNAs it is expected 
that open reading frames (ORFs) would be encountered only 
occasionally, as the 3 '-untranslated regions of brain 
mRNAs are on average longer than 1300 nucleotides (J\G. 



Sutcliffe, supra ) . Potential ORFs can be examined for 
signature protein motifs- 

The cOKA sequences obtained can then be used to 
design primer pairs for semiquantitative PGR to confirm 
tissue expression patterns. Selected products can also 
be used to isolate full-length cDNA clones for further 
analysis* Primer pairs can be used for SSCP-PCR (single 
strand conformation polymorphism-PCR) amplification of 
genomic DNA, For example , such amplification can be 
carried out from a panel of interspecific backcross mice 
to determine linkage of each PCR product to markers 
already linked. This can result in the mapping of new 
genes and can serve as a resource for identifying 
candidates for mapped mouse mutant loci and homologous 
human disease genes. SSCF-PCR uses synthetic 
oligonucleotide primers that amplify, via PCR, a small 
(100-200 bp) segment. (M. Orita et al., "Detection of 
Polymorphisms of Human DNA by Gel Electrophoresis as 
Single-Strand Conformation Polymorphisms, Proc. Natl, 
Acad. Sci. USA 36: 2766-2770 (1989); K. Orita et al., 
"Rapid and Sensitive Detection of Point Mutations in DNA 
Polymorphisms Using the Polymerase Chain Reaction, 11 
Genomics 5; 874-879 (1989)). 



The excised fragments of cDNA can be 
radiolabeled by techniques well-known in the art for use 
in probing a northern blot or for in situ hybridization 
to verify mRNA distribution and to learn the size and 
prevalence of the corresponding full-length mRNA. The 
probe can also be used to screen a cDNA library to 
isolate clones for more reliable and complete sequence 
determination. The labeled probes can also be used for 
any other purpose, such as studying in vitro expression* 



35 



III. PANELS AND DEGENERATE MIXTURES OF PRIMERS 



Another aspect of the present invention is 
panels and degenerate mixtures of primers suitable for 
the practice of the present invention* These include: 

(1) a panel of primers comprising IS primers of 
the sequence AH3^-T-C^-A-CH3-G-T-A-T-C-G-G-N-N (SEQ ID 
NO: 3) , wherein N is one of the four deoxyribonucleotides 
A, C, G, or T; 

(2) a panel of primers comprising 64 primers of 
the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ 
ID NO; 5} , wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; 

(3) a panel of primers comprising 256 primers 
of the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N 
(SEQ ID NO: 6), wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; and 

(4) a panel of primers comprising 12 primers 
of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C- 
G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N 
(SEQ ID NO: 2) , wherein V is a deoxyribonucleotide 
selected from the group consisting of A, C, and G; and N 
is a deoxyribonucleotide selected from the group 
consisting of A, C, G, and T; and 

(5) a degenerate mixture of primers comprising 
a mixture of 12 primers of the sequences A-A-C-T-G-G-A-A- 
G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T- 
T-T-T-T-T-T-T-T-T-V-N (SE q id NO: 2), wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, c, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T, each of the 12 
primers being present in about an equimolar quantity. 

The invention is illustrated by the following 
Example . The Example is for illustrative purposes only 
and is not: intended to limit the invention. 
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Resolution of Brain mRNAs Using Primers Corresponding to 
Sequences of Known Brain mRNAs of Different 
Concentrations 

To demonstrate the effectiveness of the method 
of the present invention, it was applied using 5'-primers 
extended at their 3 '-ends by two nucleotides and 
corresponding to the sequence of known brain mRNAs of 
different concentrations, such as neuron-specific enolase 
(NSE) at roughly 0,5% concentration (S. Forss-Petter et 
al., "Neuron-Specific Enolase: Complete Structure of Rat 
mRNA, Multiple Transcriptional Start Sites and Evidence 
for Translational Control," J- Neurosci. Res. 16: 141-156 
(1986) ), RC3 at about 0,01%, and somatostatin at 0.001% 
(G.H. Travis & J.G. Sutcliffe, "Phenol Emulsion-Enhanced 
DNA-Driven Subtractive cDNA Cloning: Isolation of Low- 
Abundance Monkey Cortex-Specific mRNAs," Proc, Natl, 
Acad. Sci. USA 85: 1696-1700 (1983)) to compare cDNAs 
made from libraries constructed from cerebral cortex, 
striatum, cerebellum and liver RNAs made as described 
above. On short autoradiographic exposures from any 
particular RNA sample, 5Q-1QQ bands were obtained* Bands 
were absolutely reproducible in duplicate samples. 
Approximately two-thirds of the bands differed between 
brain and liver samples, including the bands of the 
correct lengths corresponding to the known brain-specific 
mRNAs, This was confirmed by excision of the bands from 
the gels, amplification and sequencing. Only a few bands 
differed among samples for various brain regions for any 
particular primer, although some band intensities 
differed. 
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The band corresponding to NSE, a relatively 
prevalent mRNA species, appeared in all of the brain 
samples but not in the liver samples, but was not 
observed when any of the last three single nucleotides 
within the four-base 3 '-terminal sequence -N-N-N-N was 
changed in the synthetic 5 '-primer. When the first N was 
changed , a small amount of bleedthrough is detected. For 
the known species, the intensity of the autoradiographic 
signal was roughly proportional to mRNA prevalence, and 
mRNAs with concentrations of one part in 10 5 or greater of 
the poly (A)* RNA were routinely visible, with the 
occasional problem that cDNAs that migrated close to more 
intense bands were obscured. 

A sample of the data is shown in Figure 2. In 
the 5 gel lanes on the left, cortex cRNA was substrate 
for reverse transcription with the primer A-G-G-T-C-G-A- 
C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3) where — N-N is -C-T 
(primer 118), -G-T (primer 116) or -OG (primer 106). 
The PCR amplification used primers A-G-G-T-C-G-A-C-G-G-T- 
A-T-C-G-G-N-N-N-N (SEQ ID NO: 6) where -N-N-N-N is -C-T- 
A-C (primer 128), -C-T-G-A (primer 127), -c-T-G-C (primer 
111), -G-T-G-C (primer 134), and -C-G-G-C (primer 130), 
as indicated in Figure 2- Primers 118 and 111 match the 
sequence of the two and four nucleotides, respectively, 
downstream from the Mso l site located the nearest the 3'- 
end of the NSE mRNA sequence . Primer 127 is mismatched 
with the NSE sequence in the last (-1) position, primer 
128 in the next-to-last (-2) position, primers 106 and 
130 in the -3 position, and primers 116 and 134 in the -4 
position. Primer 134 extended two nucleotides further 
upstream than the others shown here, hence its PCR 
products are two nucleotides longer relative to the 
products in other lanes* 

/ 
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In each lane, 50*100 bands were visible in 15- 
minute exposures using ^P-dCTF to radiolabel the 
products. These bands were apparently distinct for each 
primer pair, with the exception that a subset of the US- 
UI bands appeared more faintly in the 116-134 lane, 
trailing by two nucleotides, indicating bleedthrough in 
the four position. 

The 113-111 primer set was used again on 
separate cortex (CX) and liver (LV) cRNAs. The cortex 
pattern was identical to that in lane 118-111, 
demonstrating reproducibility. The liver pattern 
differed from CX in the majority of species* The 
asterisk indicates the position of the NSE product. 
Analogous primer sets detected RC3 and somatostatin 
(somat) products (asterisks) in CX but not LV lanes. The 
relative band intensities of a given PGR product can be 
compared within lanes using the same primer set, but not 
different sets. 

This example demonstrates the feasibility and 
reproducibility of the method of the present invention 
and its ability to resolve different mRNAs. It further 
demonstrates that prevalence of particular mRNA species 
can be estimated from the intensity of the 
autoradiographic signal. The assay allows mRNAs present 
in both high and low prevalence to be detected 
s imul t aneous ly . 

ADVANTAGES OF THE PRESENT INVENTION 

The present method can be used to identify 
genes whose expression is altered during neuronal 
development, in models of plasticity and regeneration, in 
response jfco chemical or electrophysiological challenges 
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such as neurotoxicity and long-term potentiation , and in 
response to behavioral, viral, drug/ alcohol paradigms, 
the occurrence of cell death or apoptosis, aging, 
pathological conditions, and other conditions affecting 
mRNA expression. Although the method is particularly 
useful for studying gene expression in the nervous 
system, it is not limited to the nervous system and can be 
used to study mRNA expression in any tissue. The method 
allows the visualization of nearly every mRNA expressed 
by a tissue as a distinct band on a gel whose intensity 
corresponds roughly to the concentration of the mRNA. 

The method has the advantage that it does not 
depend on potentially irreproducible mismatched random 
priming, so that it provides a high degree of accuracy 
and reproducibility. Moreover, it reduces the 
complications and imprecision generated by the presence 
of concurrent bands of different length resulting from 
the same mRNA species as the result of different priming 
events. In methods using random priming, such concurrent 
bands can occur and are more likely to occur for mRNA 
species of high prevalence- In the present method, such 
concurrent bands are avoided. 

The method provides sequence-specific 
information about the mRNA species and can be used to 
generate primers, probes, and other specific sequences. 

Although the present invention has been 
described in considerable detail, with reference to 
certain preferred versions thereof, other versions are 
possible. Therefore, the spirit and scope of the 
appended claims should not be limited to the description 
of the preferred versions contained herein. 
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SEQUENCE LISTING 



1017* 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Er lander, Mark G. 

Sutcliffe, J. G. 

(ii) TITLE OF INVENTION: Method for Simultaneous Identification of 
Differentially Expressed mRNAs and Measurement of Relative 
Concentrations 

(iii) NUMBER OF SEQUENCES: 6 



(A) ADDRESSEE: Sheldon & MaJc 

(B) STREET: 225 South Lake Avenue, Ninth Floor 

(C) CITY: Pasadena 
(DJ STATE: California 
(E) COUNTRY: USA 

(FJ ZIP: 91101 



M. (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) software: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/ , 

(B) FILING DATE: 12-NOV-1993 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 



« (iv) 



CORRESPONDENCE ADDRESS: 



COMPUTER READABLE FORM: 



(A) NAME: Farber, Michael B 

(B) REGISTRATION NUMBER: 32,512 
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(C) REFERENCE/ DOCKET NUMBER: 10178 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (818) 796-4000 

(B) TELEFAX: (318) 795-6321 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
O (D) TOPOLOGY: linear 

Clj(ii) MOLECULE TYPE: DNA (genomic) 

Jiii) HYPOTHETICAL : NO 

ANTI-SENSE: NO 

ORIGINAL SOURCE: 
(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
AACTGGAAGA ATTC 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 
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10178 

(ii) MOLECULE TYPE: DMA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

AACTGGAAGA ATTCGCGGCC GCAGGAATTT TTTTTTTTTT TTTTTVN 47 

(2|1 INFORMATION FOR SEQ ID NO: 3: 

U] (i) SEQUENCE CHARACTERISTICS: 
f! (A) LENGTH: 18 base pairs 

s (B) TYPE: nucleic acid 

Q (C) STRAND EDNESS : single 

Jt (D) TOPOLOGY: linear 

f4(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
AGGTCGACGG TATCGGNN / 18 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
Q (iv) ANTI-SENSE: NO 

J (vi) ORIGINAL SOURCE: 

Ul (A) ORGANISM: Synthetic primer 

111 

^ (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 
GAiCAAAAGC TGGAGCTCCA CCGC 

Pi 

(2 INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 
AGGTCGACGG TATCGGNNN 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

B (C) STRAND EDNESS : single 

# CD) TOPOLOGY: linear 

[|](ii) MOLECULE TYPE: DNA (genomic) 

j!{iii) HYPOTHETICAL: NO 

^j(iv) ANTI-SENSE: NO 

U1(vi) ORIGINAL SOURCE: 

p (A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AGGTCGACGG TATCGGNNNN 
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